Day 25, Nov 1, 2013

Unit of Analysis

Consider class sizes at Macalester. Here's some data:

classes = fetchData("courses.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/courses.csv
names(classes)
## [1] "sessionID" "dept"      "level"    
## [4] "sem"       "enroll"    "iid"
densityplot(~enroll, data=classes)

plot of chunk unnamed-chunk-2

mean(~enroll, data=classes)
## [1] 21.17

These data are from the college's point of view. They are truthful (but all the classes under size 10 were dropped, because the data were collected for the purpose of studying grades).

The distribution is right-skew, so the mean is bigger than the median.

median(~enroll, data=classes)
## [1] 18

Maybe these data are log-normal:

densityplot(~log(enroll), data=classes)

plot of chunk unnamed-chunk-4

exp(mean(~log(enroll), data=classes))
## [1] 19.16

We can argue about whether the mean or median provides the better description of the typical class size. But it's more important to think about why one is interested in this at all.

Perspective One: Classes as the Unit of Analysis

Example Questions:

tally( ~ enroll>=35, data=classes, format="proportion")
## 
##    TRUE   FALSE   Total 
## 0.08324 0.91676 1.00000

Perspective Two: Students as the Unit of Analysis

Suppose we transform the data from the student's point of view. For a class of size 35, there are 35 students in the class, so we should replicate the number 35 by 35 times. Similarly, for a class of size 10, the size should be replicated 10 times, for each of the 10 students.

This statement will do that (but you don't need to know a statement like this):

students = with(classes, rep(enroll, times=enroll))
mean(~students)
## [1] 27.13
median(~students)
## [1] 22
tally( ~ students>=35, format="proportion")
## 
##   TRUE  FALSE  Total 
## 0.1971 0.8029 1.0000

The distribution:

densityplot(~students)

plot of chunk unnamed-chunk-9

Back to the Busses

Getting to your job interview: HTML, RMD

Walk through the probability calculations.

Should the bus company report from the busses' or the passengers' point of view? They are both legitimate, but they serve different purposes.